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Abstract 

Background: The chloroplast-localized ribulose-1, 5-biphosphate carboxylase/oxygenase (Rubisco), the primary 
enzyme responsible for autotrophy, is instrumental in the continual adaptation of plants to variations in the 
concentrations of C0 2 . The large subunit (LSU) of Rubisco is encoded by the chloroplast rbcL gene. Although 
adaptive processes have been previously identified at this gene, characterizing the relationships between the 
mutational dynamics at the protein level may yield clues on the biological meaning of such adaptive processes. 
The role of such coevolutionary dynamics in the continual fine-tuning of RbcL remains obscure. 

Results: We used the timescale and phylogenetic analyses to investigate and search for processes of adaptive 
evolution in rbcL gene in three gymnosperm families, namely Podocarpaceae, Taxaceae and Cephalotaxaceae. To 
understand the relationships between regions identified as having evolved under adaptive evolution, we 
performed coevolutionary analyses using the software CAPS. Importantly, adaptive processes were identified at 
amino acid sites located on the contact regions among the Rubisco subunits and on the interface between 
Rubisco and its activase. Adaptive amino acid replacements at these regions may have optimized the holoenzyme 
activity. This hypothesis was pinpointed by evidence originated from our analysis of coevolution that supported 
the correlated evolution between Rubisco and its activase. Interestingly, the correlated adaptive processes between 
both these proteins have paralleled the geological variation history of the concentration of atmospheric C0 2 . 

Conclusions: The gene rbcL has experienced bursts of adaptations in response to the changing concentration of 
C0 2 in the atmosphere. These adaptations have emerged as a result of a continuous dynamic of mutations, many 
of which may have involved innovation of functional Rubisco features. Analysis of the protein structure and the 
functional implications of such mutations put forward the conclusion that this evolutionary scenario has been 
possible through a complex interplay between adaptive mutations, often structurally destabilizing, and 
compensatory mutations. Our results unearth patterns of evolution that have likely optimized the Rubisco activity 
and uncover mutational dynamics useful in the molecular engineering of enzymatic activities. 

Reviewers: This article was reviewed by Prof. Christian Blouin (nominated by Dr W Ford Doolittle), Dr Endre Barta 
(nominated by Dr Sandor Pongor), and Dr Nicolas Galtier. 



* Correspondence: tingwang@wbgcas.cn; suyj@mail.sysu.edu.cn 
1 CAS Key Laboratory of Plant Germplasm Enhancement and Specialty 
Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 
China 

7 State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen 
University, Guangzhou, China 

Full list of author information is available at the end of the article 

O© 201 1 Sen et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons 
BiolVlGCl Central Attribution License (http://creativecommons.Org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in 
any medium, provided the original work is properly cited. 



Sen et al. Biology Direct 201 1, 6:29 
http://www.biology-direct.eom/content/6/1/29 



Page 2 of 1 9 



Background 

In spite of its slow non-specific catalysis the chloroplast- 
localized ribulose-l,5-biphosphate carboxylase/oxygenase 
(Rubisco, EC 4.1.1.39), the most abundant protein in 
nature, is the primary enzyme responsible for autotro- 
phy [1]. Rubisco is a bi-functional enzyme catalyzing 
both the carboxylation of D-ribulose-l,5-bisphosphate 
(RuBP) that initiates photosynthetic C0 2 fixation and 
the oxygenation of RuBP that starts the nonessential 
photo-respiratory pathway [2]. Its holoenzyme in green 
algae and higher plants consists of eight large subunits 
(LSUs) encoded by the chloroplast (cp) gene rbcL and 
eight small subunits (SSUs) encoded by the nuclear 
gene rbcS. Active sites are formed at the intra-dimer 
interfaces among the C-terminal, a/p barrel domain of 
one large subunit and the N-terminal domain of another 
[3]. C0 2 and 0 2 mutually compete as substrates for the 
active sites, and the ratio of carboxylation to oxygena- 
tion ultimately affects the efficiency of net carbon assim- 
ilation [4]. Understanding of the molecular evolution of 
rbcL genes may shed light on the functional/structural 
features governing Rubisco activity. The knowledge is 
also paramount from the biotechnological point of view 
since photosynthesis is tightly linked to the delicate bal- 
ance between carboxylation and oxygenation. Subtle 
alteration of this balance can have a significant impact 
upon photosynthetic productivity [5]. 

Most of the evolutionary changes optimizing Rubis- 
co's function have been likely subjected to selection 
forces owing to the direct relationship between this 
function and the biological fitness of the plant. Most 
molecular changes favouring the enzyme function 
would be fixed by adaptive evolution, while the changes 
compromising its activity would be removed by purify- 
ing selection. Following the inverse rational, investiga- 
tion of adaptive evolution processes in Rubisco can aid 
in identifying key changes in its function. Adaptive 
evolution has seldom been observed in cp genes [6], 
mainly due to: i) Low rates of evolution of protein-cod- 
ing cp genes [7], which challenges the sensitivity of the 
methods to identify selection due to the lack of statisti- 
cal power [8]; ii) With the exception of few cases [6], 
cp genes present a low propensity to undergo duplica- 
tion [9], a fundamental mechanism in generating the 
source of novel functions [10]; and iii) The lack of 
models that realistically parameterize the evolution of 
cp genes, known to present codon usage bias and 
hence to yield misleading selection results when inap- 
propriate models are applied [11,12]. Nonetheless, the 
use of more realistic models may prove successful 
in identifying true adaptive molecular changes in cp 
genes, many of which would be linked to plant adaptive 
radiations [13]. 



Rubisco enzymatic efficiency has been shown to be 
fine-tuned in diverse autotrophy species [14]. Significant 
positive selection events have been also identified in the 
RbcL subunit of most land plant lineages by using simu- 
lation approaches [15]. In particular, the adaptive evolu- 
tion of rbcL genes has been found in the aquatic plant 
Potamogeton [16] and the F-type lineage of Conocepha- 
lum [17]. These findings make it plausible to hypothe- 
size that RbcL subunit may have undergone "continual 
fine-tuning" in green plants to adapt to C0 2 concentra- 
tion changes across geological epochs [18]. How are 
these adaptive process reflected at the molecular level? 
To understand and answer these questions, adaptive 
evolution analysis must be complemented with the iden- 
tification of coevolutionary dynamics that can highlight 
the intricate co-adaptive relationships between residues 
in the protein under an estimated timescale [14,19]. In 
this respect, current molecular evolutionary methods 
such as the identification of coevolutionary sites [20] 
and relaxed molecular clock inference models [21,22] 
offer a unique opportunity to dissect the rbcL gene 
fine-tuning. 

Podocarpaceae comprises members that extend both 
southern and northern hemispheres, accounting for 
nearly 14% of the gymnosperm diversity. The family is 
predominantly occupying mesic temperate and sunny tro- 
pical mountain habitats [23]. Taxaceae (also known as 
taxads or the yew family), including 25 species, is a wide- 
spread albeit locally endangered gymnosperm family [24]. 
The genus Cephalotaxus was previously listed in the 
Taxaceae [25]. However, recent molecular evidences con- 
firmed that it should be treated as a family (Cephalotaxa- 
ceae) [26]. Since these three families possess different 
contemporary net diversification, characterizations of the 
evolutionary processes in key genes may help to unearth 
the dynamics of their species diversification. In the pre- 
sent study we conduct an exhaustive analysis of such evo- 
lutionary dynamics and address the following questions: i) 
Whether and to what extent have adaptive evolution 
played a role in the evolutionary innovation of the rbcL 
gene; ii) How important have this evolutionary force been 
in the ecological diversification of the three gymnosperm 
families (Taxaceae, Cephalotaxaceae and Podocarpaceae); 
and iii) What is the coevolutionary pattern within the 
RbcL subunit of these families? 

Results 

Alignments and sequence characteristics 

The alignment comprised 1269 positions (423 codons) of 
the rbcL gene. Gaps were excluded from the sequences. 
The model most suited to explain the substitution 
dynamics and the phylogenetic tree relating all the 
sequences of this multiple sequence alignment (MSA) 
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was determined by Modeltest 3.7 [27] and Datamonkey 
2010 website (http://www.datamonkey.org) [28]. 

Dating the phylogenetic divergence events 

Two monophyletic clades of Taxaceae-Cephalotaxaceae 
and Podocarpaceae were identified by the Bayesian ana- 
lysis, both supported with high posterior probabilities 
(Figure 1). Within Podocarpaceae, besides Podocarpus 
three well-supported genera were obtained: Nageia, 
Dacrycarpus and Dacrydium. Dacrydium was basal to 
the remain Podocarpaceae species. The genus Podocar- 
pus diverged into two groups, Podocarpus I and Podo- 
carpus II. In the Taxaceae-Cephalotaxaceae clade, four 
major lineages were resolved. Among them the Torreya- 
Amentotaxus lineage was sister to the remainder; the 
family Cephalotaxaceae was sister to the three remaining 
genera of the family Taxaceae; and the genus Taxus was 
the sister to the Pseudotaxus-Austrotaxus lineage. These 
results indicated that the divergence between Taxaceae- 
Cephalotaxaceae and Podocarpaceae represented the 
first major split during the evolution of the three 
families and that Cephalotaxaceae is nested in Taxaceae. 

The absolute divergence time and evolutionary rates 
of all nodes in the tree were estimated under the uncor- 
rected lognormal model (UCLD), and the global atmo- 
spheric C0 2 concentrations were also mapped according 
to the estimated timescale (Figure 1, Table 1). This ana- 
lysis provided a time estimate of 204 Mya to the most 
recent common ancestor (MRCA) for the extant Podo- 
carpaceae (Figure 1). The results demonstrated that the 
splitting between the genera Nageia and Podocarpus 
occurred during early Paleocene following the Cretac- 
eous-Tertiary (K/T) extinction event, after which the 
genus Podocarpus segregated into two groups {Podocar- 
pus I and II). The lineage Torreya-Amentotaxus diverged 
from the remainder of the other members in Taxaceae- 
Cephalotaxaceae during the Jurassic; Cephalotaxaceae 
departed with the rest three genera in Taxaceae during 
the Cretaceous; and Taxus split with Pseudotaxus-Aus- 
trotaxus during the Tertiary. 

Consistent adaptive evolution in the phylogeny of rbcL 

The primary proportion of the sites (p 0 > 89%, co 0 < 0.1) in 
the three families were under purifying selection (Figure 2, 
Table 2). A small proportion of sites (p 2 = 2%, co 2 = 2.92) 
were under positive selection (Table 2). Seven sites (A11V, 
Q14K, K30Q, S95N, V99A, I133L, and L225I) were signifi- 
cantly identified as positively selected sites under both 
PAML and Selecton programs via random-site models, 
and three more (G86D, S143A, and T262V) were found to 
have evolved under adaptive evolution using the branch- 
site models. One site (K30Q) was also recognized as posi- 
tively selected via conservative nested models (Mla/M2a). 
The significant level of nested models was assessed by the 



likelihood ratio test (Table 2). The j 2 test indicated that 
all the alternative hypotheses (M2a, M3, and M8) in the 
random-site models significantly outperformed their com- 
parable null test (Mia, MO, and M7/M8a). Moreover, the 
branch-site model A (co 2 estimated) fit five branches 
(Podocarpaceae, Podocarpus I and II, Taxaceae, and 
Taxus) significantly better than its null model (co 2 = 1 
fixed) (Table 2 and 3). Podocarpaceae and Podocarpus I 
and II were also identified to be under positive selection, 
even after correcting for multiple tests by the Bonferroni 
correction [29] (Table 3). By contrast, when Torreya and 
Cephalotaxaceae were specified as the foreground 
branches, both the test and null models had similar log- 
likelihood values, indicating that there was not enough evi- 
dence to support adaptive diversifying scenarios. 

Four sites (A11V, G86D, V99A, and I133L) were 
detected under adaptive evolution along the ancestral 
branch of Podocarpaceae (Figure 3). The ancestor of 
Podocarpus I presented two positively selected candidate 
sites (A11V and Q14K); however, neither of them was 
significantly supported by the posterior probabilities. As 
for Podocarpus II, two sites (S143A and T262V) were 
identified under selection with strong statistical sup- 
ports. In addition, three sites (A11V, V99A, and I133L) 
were found to be positively selected on the branch lead- 
ing to the ancestor of Taxaceae, whereas only one 
(K30Q) was identified in the ancestor of Taxus. 

Inter-dependent evolution of amino acid sites in the RbcL 
subunit 

Analysis of coevolution in RbcL using CAPS identified 
21 co-evolution groups (Figure 4). A coevolution group 
was defined to include all those sites that presented coe- 
volution signal with all the other sites within the same 
group: if site A was coevolving with B, B with C and A 
with C, then all three sites were included within one 
coevolution group. The largest group (Figure 5) included 
four sites (11V, 14K, 19D, and 56A), while the smallest 
contained only two. The residues belonging to most of 
the co-evolving pairs presented significant correlations 
with their physicochemical properties, including hydro- 
phobicity, molecular weight, or both of them. For exam- 
ple, four of the co-evolution pairs (11V/19D, 14K/19D, 
11V/86D, and 50P/219V) detected among the 21 co- 
evolution groups exhibited correlated hydrophobicity; 
meanwhile, the other four pairs (14L/258R, 14K/142P, 
95N/145S, and 255V/251M) correlated in their molecu- 
lar weight variance. 

Discussion 

Adaptive evolution of the rbcL gene in the three 
gymnosperm families 

The random-site models [30] were employed to examine 
the adaptive evolution of the rbcL gene in three 
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Figure 1 The phylogenetic tree inferred from rbcL sequences under the UCLD model Geologic timescale is tagged above the 
phylogenetic tree (Unit: Million years). The estimated global atmospheric C0 2 concentrations from the GEOCARB III model are mapped under 
the phylogenetic tree according to the geologic timeline. Each node in the tree is numbered. Posterior probability values are shown along the 
branches and those with posterior probability > 0.9 are heavily thickened. Two time intervals are demonstrated in grey. The six major clades 
concerned in this research are indicated: Podocarpaceae, Podocarpus I and II, Cephalotaxaceae, Taxus and Torreya. The length of each branch is 
in proportion to the divergence time estimated by using the UCLD model. 
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Table 1 Parameters estimated under the UCLD model 


Node No. 


Posterior 


Estimated divergence 


Estimated rates 


Geologic period 




probability 


time (Mya) 


(10~ 10 nt/year) 




1 


100 


8.62 


1.52 


Neogene 


7 


99.97 


29.61 


2.85 


Paleogene/Neogene 


13 


100 


49.65 


2.69 


Paleogene 


25 


95.51 


52.95 


1.69 


Paleogene 


26 


100 


19.98 


2.06 


Neogene 


27 


99.99 


66.75 


2.46 


Paleogene 


28 


99.99 


61.78 


1.99 


Paleogene 


30 


100 


134.19 


4.66 


Lower Cretaceous 


32 


100 


9.13 


2.41 


Paleogene 


35 


99.67 


31.9 


1.96 


Paleogene 


36 


100 


39.88 


2.94 


Paleogene 


38 


100 


34.74 


1.28 


Paleogene 


39 


99.99 


69.45 


1.65 


Cretaceous/Paleogene 


40 


96.18 


8.18 


1.59 


Neogene 


47 


99.29 


21 .02 


1.56 


Mpnnpnp 

1 N S3 V_-j S3 1 IS3 


49 


98.09 


4.12 


1.48 


Neogene/Quaternary 


50 


98.77 


21.05 


1.52 


Neogene 


51 


100 


30.85 


2.53 


Paleogene/Neogene 


54 


98.18 


1.47 


2.01 


Neogene/Quaternary 


55 


100 


9.98 


2.47 


Neogene/Quaternary 


64 


100 


20.17 


2.57 


Neogene 


65 


99.76 


136.64 


1.58 


Lower Cretaceous 


67 


100 


168.52 


2.89 


Middle Jurassic 


70 


99.97 


174.16 


1.97 


Middle Jurassic 



Results for the well supported nodes in Figure 1 are shown. 
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Figure 2 Estimated parameters under the M8 model using Selecton web-server. Approximate posterior means of co are weighted by the 
posterior probabilities. Sites are numbered according to the reference sequence from Taxus mairei (GenBank accession number: AY450856). 
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Table 2 Parameter estimates from tests for selection 



Model 


np 


I 


Parameters 


Positively selected sites 


MO: One ratio 


137 


-5654.77 


co = 0.173 


None 


M1a: Nearly neutral 


138 


-5539.34 


p 0 = 0.885, co 0 = 0.062; p-, = 0.1 14, co, = 1 


Not allowed 


M2a: Positive selection 


140 


-5529.73 


p 0 = 0.892, co 0 = 0.07; p 0 = 0.085, ^ = 1; p 2 = 
0.023, co 2 = 2.92 


K30Q 


M3: Discrete 


139 


-5536.84 


p 0 = 0.92, co 0 = 0.08; p n = 0.079, co, = 1 .46 


A1 1V, Q14K, E28Q, K30Q, G86D, S95N, V99A, I133L, 
L225I, 1251 M, K305R 


M7: p 


138 


-5547.37 


p = 0.015, g = 0.074 


Not allowed 


M8: p 


140 


-5527.82 


p 0 = 0.96; p = 0.09, g = 0.38; p-, = 0.039, w = 2.20 


A1 1V, Q14K, K30Q, S95N, V99A, I133L, L225I 


Branch-site models 


Taxaceae- 
Cephalotaxaceae 


Model A: co 2 = 1, fixed 


139 


-5533.19 


p o =0.857, £»o=O.O53;p 1 =O.O61 # £0 1 =1;p 2 a+P2b = 

0.082, co 2 = 1 


Not allowed 


Model A: co 2 
estimated 


140 


-5530.95 


p o =0.891, w o =0.06;p 1 =0.078,w 1 =1 ;p 2a +p 2b = 0.03, 
co 2 = 2.55 


A11V, V99A, I133L 


Taxus 


Model A: co 2 = 1, fixed 


139 


-5535.98 


p o =0.852, w o =0.057;p l =0.106,w l =1 ;p 2a +p 2b = 

0.042, co 2 = 1 


Not allowed 


Model A: w 2 
estimated 


140 


-5533.12 


p o =0.86, e> 0 =0.06;p l =0.103,ft> l =1 ;p 2a +p 2b = 0.027, 
co 2 = 5.9 


K30Q 


Torreya 


Model A: co 2 = 1, fixed 


139 


-5538.23 


p 0 =0.82, w o =0.059;pi=0.106,ft>i=1 ;p 2a +p 2b = 0.073, 
co 2 = 1 


Not allowed 


Model A: w 2 
estimated 


140 


-5538.33 


p o =0.82, co 0 = 0.059^=0.1 06,£» 1 = 1;p 2a +P2b = 

0.071, co 2 = 1.02 


None 


Podocarpaceae 


Model A: co 2 = 1, fixed 


139 


-5527.9 


p o =0.84, w o =0.049;p l =0.057,w l =1 ;p 2a +p 2b = 0.099, 
co 2 = 1 


Not allowed 


Model A: w 2 
estimated 


140 


-5523.16 


p o =0.89, c0<,=O.O65;p 1 =O.O63 # ft> 1 =1;p 2 a+P2b = 0-047, 
co 2 = 2.41 


Al IV, G86D, V99A, I133L 


Podocarpus 1 


Model A: co 2 = 1, fixed 


139 


-5537.56 


p o =0.786, w o =0.05;p 1 =0.084,w 1 =1 ;p 2a +p 2b = 0.129, 
co 2 = 1 


Not allowed 


Model A: w 2 
estimated 


140 


-5533.91 


p 0 =0.727, £w o =0.058;p 1 =0.091 f £» 1 =1;p 2 a+P2b = 

0.181, co 2 = 1.12 


None 


Podocarpus II 


Model A: co 2 = 1, fixed 


139 


-5530.31 


p 0 =0.81, w o =0.054;p 1 =0.093,w 1 =1 ;p 2a +p 2b = 0.097, 
co 2 = 1 


Not allowed 


Model A: w 2 
estimated 


140 


-5526.95 


p 0 =0.85, w o =0.056;p 1 =0.096,w 1 =1 ;p 2a +p 2b = 0.056, 
co 2 = 3.31 


S143A, T262V 


Model 


np 


i 


Parameters 


Positively selected sites 


Cephalotaxaceae 


Model A: co 2 = 1, fixed 


139 


-5537.3 


p 0 =0.82, w o =0.058;p 1 =0.102,w 1 =1 ;p 2a +p 2b = 0.077, 
co 2 = 1 


Not allowed 


Model A: w 2 
estimated 


140 


-5536.84 


p o =0.862, £»o=0.059;p 1 =0.105 f fi> 1 =1;p 2 a+P2b = 

0.032, co 2 = 2.68 


None 



Positive selection sites are identified with posterior probability higher than 95%. 



gymnosperm families Podocarpaceae, Taxaceae and 
Cephalotaxaceae. Our results showed that most sites are 
under purifying selection, while a small part is under 
neutral evolution. 

Recent improvements in the statistical models made it 
feasible to infer ancestral adaptive evolution events 



[31,32]. Using these models we found that the amino 
acid sites in the RbcL subunit of both the ancestor of 
Podocarpaceae and that of Taxaceae-Cephalotaxaceae 
had undergone adaptive evolution. By contrast, no adap- 
tive evolution was detected in the ancestor of Cephalo- 
taxaceae. Moreover, within Podocarpaceae, no positively 
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Table 3 Tests for selection with Bonferroni correction 



Model 


2A£ 


d.f. 


p 


Sig. a = 0.05 


Bonferroni correction 


Sig. a = 0.05 


M0-M3 


235.86 


2 


o 


• 


No need 




M1a-M2a 


19.22 


2 


0.0001 


• 


No need 




M7-M8 


39.1 


2 


o 




|\ln nppH 




M8a-M8 


4.86 


1 


0.0275 


• 


No need 




A-A1 test 














Taxaceae-Cephalotaxaceae 


4.48 




0.0343 


• 


0.0686 


o 


Taxus 


5.72 




0.0168 


• 


0.084 


o 


Torreya 


0.2 




0.6547 


o 


3.2735 


o 


Cephalotaxaceae 


0.92 




0.3375 


o 


1 .6875 


o 


Podocarpaceae 


9.48 




0.0021 


• 


0.0105 


• 


Podocorpus I 


7.3 




0.0069 


• 


0.0345 


• 


Podocorpus II 


6.72 




0.0095 


• 


0.0475 


• 



In the branch-site model, the relevant hypotheses are verified under the Bonferroni correction to avoid false positive conclusion. The sign • denotes the results 
that pass the statistical significance tests, while o stands for the opposite. 
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Figure 3 Positively selected sites in the RbcL subunit of the Podocarpaceae ancestor. Four positive selected sites of the Podocarpaceae 
ancestor are highlighted in red arrows. The 3D imagines of the ancestral and current amino acid residues are represented in purple and 
cyanine, respectively. The ancestral amino acid residues are inferred by the DAMBE package and the ancestral state reconstruction (ASR) 
molecule on Datamonkey 2010 website for the Podocarpaceae ancestral node [28,82]. The three domains are colour coded differently. Positively 
selected sites are indicated with lines, whereas the potential ones are with black dotted lines. 
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Coevolving pairs fl Residues in loop 

Coevolving pairs by hydrophobicity | Residues in helix 



Coevolving pairs by molecular weight | Residues in sheet 

Coevolving pairs by both hydrophobicity and molecular weight I Residues connecting two structures 

Figure 4 Coevolutionary networks in the RbcL subunit of the three gymnosperm families. Residues in the networks are sorted clockwise 
in an ascending order depending on the number of coevolutionary interactions that each amino acid residue establishes. The domains to which 
these amino acid sites belong are colour-coded. Nodes (amino acid residues) are connected through edges differently according to the nature 
and characteristics of their coevolution. 
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Coevolving pairs 

Coevolving pairs by hydrophobicity 

Coevolving pairs by both hydrophobicity and molecular weight 

Figure 5 Coevolving sites within the N-terminal structure of 

the RbcL subunit. Amino acid residues involving in the 

coevolutionary network are highlighted red in the 3D structural 

diagram. And the residues are connected differently according to 

the nature and characteristics of their coevolution. 
\ J 



selected site was found in the ancestral branch of Podo- 
carpus I; however, two were detected in the branch 
leading to Podocarpus II (Figure 1). In Taxaceae, for the 
ancestor of Taxus, site 30 was identified as positively 
selected; but no such a site was found in the ancestor of 
Torreya. Totally, ten positively selected sites (A11V, 
Q14K, K30Q, G86D, S95N, V99A, I133L, S143A, L225I, 
and T262V) were identified in all the tests. 

To fully understand the mechanism of the molecular 
adaptation of rbcL gene, both the structural and the 



functional significance of those ten positively selected 
sites need to be elucidated (Table 4). Seven (A11V, 
Q14K, K30Q, I133L, S143A, L225I, and T262V) of the 
positively selected sites are located at the interface of 
Rubisco subunits, whereas the other three (G86D, S95N, 
and V99A) are at the interface between Rubisco and its 
activase (Table 2 and 4). 

Most of the amino acid sites detected to be under 
adaptive evolution have been previously identified to 
have important functional roles. Of these adaptively 
selected sites, site 11 shows an adaptive substitution 
from Alanine to Valine. Valine has an additional methy- 
lene group (-CH 2 -) in comparison with Alanine, allows 
a stronger van de Waal binding with the other residues 
at the LSU 2 C-terminus [33], tightening therefore the 
combination of LSU 2 dimer [34]. In addition, lysine and 
glutamine present side-chains that are positively and 
negatively charged, respectively [35]. Therefore the 
replacement of Q by K at site 14, identified to have 
evolved under positive selection, may create extra ionic 
bonds with its proximal amino acid sites [36]. Impor- 
tantly, site 30 is very close to site 28 [3], both possessing 
hydrophilic side-chains (Table 2). Kapralov and Filatov 
(2007) noted that site 28 is highly prone to fix amino 
acid replacements by positive selection in green plants 
[15]. Future examinations on sites 28 and 30 by using 
site-directed mutagenesis may unveil whether their 
hydrophilic side-chain substitutions have brought advan- 
tageous effects. The replacements at sites 133 and 143 
may improve the stability of LSU 2 dimer by modifying 
side-chain physicochemical characters [37,38]. More- 
over, the nonsynonymous substitutions on sites 225 and 
262 could contribute to the linkage of LSU and SSU 



Table 4 Functional roles of the amino acid sites under adaptive evolution 







Location 






Site 


Domain 


Interface 


Functional roles 


References 


Al IV 


loop 1 




To contribute to the holoenzyme thermal stability, catalytic efficiency, and 
C0 2 /0 2 specificity 


Kellogg and Juliano (1997) 
[3] 


Q14K 








Ott et ol. (2000) [106] 


K30Q 


loop 2 


LSU dinners 




Du and Spreitzer (2000) 
[107] 


I133L 


PE 






Spreitzer and Salvucci 
(2002) [36] 


S143A 


aD 






Spreitzer et ol. (2005) [64] 


L225I 


al 


LSUs and SSUs 




Makowski et ol. (2008) [33] 


T262V 


P4 








G86D 


PC 


Rubisco and its 
activase 


To affect the contacts between the Rubisco and its activase 


Du et ol. (2003) [108] 


S95N 


PD 






Portis (2003) [40] 


V99A 








Portis et ol. (2008) [42] 



All positively selected amino acid sites in Table 3 are included. 



Sen et al. Biology Direct 201 1, 6:29 
http://www.biology-direct.eom/content/6/1/29 



Page 10 of 19 



subunits (Table 4). Recent genetic analyses of a hybrid 
Rubisco pointed to that the combination of LSU and 
SSU have effects on the holoenzyme expression [39]. 

Rubisco activase is involved in the opening of the 
closed state of the Rubisco form to release RuBP, produ- 
cing the active enzyme form [40]. The physical contact 
between the Rubisco site 89 and the activase site 
314 has been reported by Li et al (2005) [41]. Moreover, 
the negatively charged Rubisco site 93 is potentially 
complementary to the positively charged activase site 
312 [42]. Hence, changes of the three neighbouring sites 
(G86D, S95N, and V99A) around sites 89 and 93 may 
affect the contacts between the Rubisco and its activase. 
In summary, positively selected sites identified in the 
rbcL gene of the three gymnosperm families are located 
either on the contact surface between Rubisco subunits 
or between the Rubisco and its activase. Nonsynon- 
ymous substitutions among them have great potentials 
to optimize the enzyme characteristics. 

RbcL gene is located in the large single copy region of 
chloroplast genome [43,44]. It has been regarded as a 
region with no evidence for adaptive evolution [45], but 
recent research has brought this into question [15]. Cur- 
rently, Ohno's model is frequently used to describe pro- 
tein evolution and adaptation [46], which emphasizes 
the role that gene duplication plays in the adjustment of 
protein functions [47,48]. The "gene sharing" model 
stresses that it is at the single copy gene stage, namely 
before the occurrence of gene duplication, that func- 
tional genes have already adapted [49]. For instance, stu- 
dies on the eye lens crystallins showed that the gene 
duplication indeed occurred after the protein had 
diverged from its ancestral function [50,51]. This study 
on the rbcL gene further unveils adaptive evolutionary 
processes in the single copy chloroplast gene. It has 
been known that for rbcL, only a few nonsynonymous 
substitutions, especially those at catalytic active sites on 
the LSU and SSU interface or between the Rubisco 
and its activase interface, can significantly change the 
Rubisco features (e.g. thermal stability, catalytic 
efficiency, and C0 2 /0 2 specificity) [52]. Indeed, the 
positively selected sites we identified in the three gym- 
nosperm families are all located at these positions. 
Adaptive substitutions at the sites, possibly generated 
during chloroplast DNA replication or repair [53], may 
impose great potential for optimizing the Rubisco func- 
tional/structural characteristics. As shown in Table 2, 
our results suggest that: i) mutations at most sites in 
rbcL {po = 89.2%, co 0 = 0.07) may be deleterious; ii) 5.8% 
of the substitutions probably have no significant effect 
on fitness; and iii) 2.3% of the replacements are likely to 
improve the Rubisco performance. It is therefore reason- 
able to postulate that the three gymnosperm families 
may have adaptively responded to habitat pressures by 



adjusting the Rubisco function/structure during their 
evolution. 

Historic adaptation of the rbcL gene under continual 
changing of C0 2 concentration 

The historic accumulation of nucleotide substitutions in 
the rbcL gene has supplied the Podocarpaceae species 
with differentiated Rubisco catalytic efficiency, which 
could contribute to their adaptive radiation into diverse 
ecological niches [54]. By contrast, the absence of rbcL 
adaptation in Cephalotaxus and Torreya may explain why 
their members tend to be locally distributed (Table 2). 

Two positively selected sites were identified at the 
ancestral branch of Podocarpaceae and each was at the 
interface between the Rubisco and its activase (Table 2). 
The global C0 2 concentration has been acutely chan- 
ging since 170 Mya to 65 Mya [55]. Our results showed 
that the ancestor of Podocarpaceae diverged from Taxa- 
ceae-Cephalotaxaceae around 204 Mya; and its descen- 
dants started to diverge about 134 Mya [56]. From 170 
Mya to 134 Mya, the rbcL variants in the ancestor of 
Podocarpaceae that could better adapt to the global 
reduction of C0 2 concentration became consequently 
fixed (Figure 1, Table 2). Sage and Coleman (2001) has 
noted that increasing the expression level of the Rubisco 
and its activase is an effective solution for plants to 
respond to the reduction of C0 2 concentration [57]. 
Here we further demonstrated that the simultaneous 
and inter-dependent adjustment of the two enzymes 
provides an alternative adaptive mode. 

Many transgenic manipulations have been attempted 
to improve Rubisco's catalytic performance, since its 
property largely determines the maximum efficiency of 
photosynthesis in the use of light, water, and fertilizer N 
resources [58]. The function of Rubisco in terrestrial 
plants is identical. Evolutionary pressures seem to have 
driven it towards more efficient utilization of C0 2 , and 
recent analyses have indicated that the optimal perfor- 
mance of Rubisco is basically determined by C0 2 con- 
centration [59,60]. In this study, adaptive evolution of 
the rbcL gene has been detected in Podocarpaceae and 
Podocarpus I and II (Table 3), which implies that the 
Podocarpaceae species have undergone continual modi- 
fication in their RbcL subunit for better fitness along 
with the changing atmosphere C0 2 concentration. The 
positively selected sites identified (Table 3) may also 
benefit future studies for improving the Rubisco effi- 
ciency and for elucidating the interactions between the 
enzyme and its activase [61]. 

Site-specific coevolution in RbcL subunit 

To further understand the complex evolutionary pat- 
terns, we also analysed the inter-dependence among 
amino acid regions in the RbcL subunit. Adaptive 
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evolution is only expected to occur on functionally and 
structurally meaningful amino acid sites where nonsy- 
nonymous substitutions are most likely to destabilise 
and hence to compromise organism's fitness [62,63]. 
Consequently, evolution of biological innovation is 
dependent upon the fixation of mutations close in the 
structure that can compensate the destabilising effects 
of innovative mutations. This may well explain the pat- 
tern we observe in the Rubisco, with many of the muta- 
tions that are under adaptive evolution and that co- 
evolve occurring at the interface of the protein dimer 
where stability is essential for the preservation of 
enzyme function and integrity. In support of the co-evo- 
lutionary hypothesis as a mean to fix innovative muta- 
tions in the RbcL subunit, previous research has shown 
that when multiple amino acid replacements are intro- 
duced, significant changes can occur in both enzyme 
catalytic efficiency and specificity [64]. In our co-evolu- 
tionary analyses, 21 co-evolution groups were identified 
within the RbcL subunit. In particular, evolutionary 
dependencies were recognized among sites belonging to 
different domains (Figure 5). Some correlated pairs (e.g. 
25 1M and 255 V) were linearly proximal, whereas the 
others (e.g. 19D and 56A) were linearly distant but 
structurally proximal. Among the co-evolving sites, we 
also identified structurally and linearly distant sites (e.g. 
14 K and 86D) (Figure 4 and 5). Although certain stu- 
dies have attributed a biological meaning for the linearly 
proximal sites [65], many other reports elucidated that 
physical connections can be established between distant 
functional sites in the quaternary protein structures 
[66-69]. 

Conclusions 

This research presents substantial evidence that point to 
a complex adaptive process associated with the func- 
tional innovation of the Rubisco protein. This process 
involves a continuous checking of the structural and 
functional consequences of the fixation of novel muta- 
tions as well as the amelioration of the effects by such 
mutations through compensatory replacement events. 
Several other conditions related to the population genet- 
ics of the individuals have to be met in order for a com- 
pensatory mutation to be fixed before the individuals 
carrying the destabilising mutations are purged by selec- 
tion. Among the possible hypothetical scenarios is the 
relaxation of the action of selection or the action of 
other buffering mechanisms. Species from both Taxa- 
ceae and Cephalotaxaceae are characterized by their 
small population sizes and clonality, which makes it 
feasible for genetic drift to act and facilitate the fixation 
of the mutations regardless of their immediate fitness 
consequences. Although this seems to be a possible sce- 
nario, several points remain to be investigated in future 



studies such as to assess the real inter-dependence 
among mutations through directed mutagenesis or to 
examine the strength of genetic drift at the population 
levels. Our research however opens exciting new ave- 
nues that may lead to a more complete understanding 
of the functional novelties in the Rubisco among 
gymnosperms. 

Methods 

Plant Sampling 

Plant materials sampled for this investigation (Table 5), 
including Cephalotaxus sinensis (Rehder & E. H. Wilson) 
H. L. Li, C. hainanensis H. L. Li, C. fortunei Hook., 
C oliveri Mast., Taxus chinensis (Pilg.) Rehder, T. yunna- 
nensis W. C. Cheng & L. K. Fu, T. mairei (Lemee & 
H. Lev.) S. Y. Hu ex T. S. Liu, Amentotaxus yunnanensis 
H. L. Li, A. argotaenia (Hance) Pilg., Torreya fargesii var. 
yunnanensis (W. C. Cheng & L. K. Fu) N. Kang, Pseudo- 
taxus chienii (W. C. Cheng) W. C. Cheng, Podocarpus 
macrophyllus (Thunb.) Sweet, P. neriifolius D. Don, Dacry- 
carpus imbricatus (Blume) de Laub., and Dacrydium pier- 
rei de Laub., were collected from South China Botanical 
Garden, Chinese Academy of Sciences, Guangzhou, China. 
In addition, Nageia nagi (Thunb.) Kuntze was collected 
from the campus of Sun Yat-sen University, Guangzhou, 
China. 

Total genomic DNA extraction 

Genomic DNA was extracted from leaves of individual 
trees by a modified cetyltrimethyl ammonium bromide 
(CTAB) method. 0.2 g fresh leaf tissue was ground to fine 
powder with a mortar and pestle in liquid nitrogen. The 
leaf powder was allotted equally into two Eppendorf tubes. 
One mL -20°C propanone was added to each tube. The 
tube was rocked gently for 1 min, centrifuged at 5,000 rpm 
for 10 min, and the supernatant was discarded; the proce- 
dure was repeated once. Each tube was added with 1 mL 
60°C CTAB extraction buffer, mixed well, and incubated 
at 80°C for 4 hours. After incubation, 500 \A chloroform: 
isoamyl alcohol (24: 1) was added, mixed and then centri- 
fuged at 13,000 rpm for 10 min. The top aqueous phase 
was removed to a new tube, added with 2/3 volume cold 
isopropanol, and mixed gently to precipitate DNA. DNA 
was dissolved in 70 \i\ TE buffer, and the quality was 
determined by 1% agarose gel electrophoresis. 

PCR amplification and DNA sequencing 

PCR amplification of rbcL sequences was carried out in 
100 \A volumes containing 50 mM KC1, 10 mM Tris-HCl 
(pH 8.0), 0.1% Triton X-100, 1.5 mM MgCl 2 , 0.2 mM 
each deoxynucleoside triphosphate, 2 U Taq DNA poly- 
merase, 0.3 (iM primer, 30 ng genomic DNA, and DNA- 
free water. The thermo-cycling program was set for 5 
min at 95°C, 35 cycles of 1 min at 94°C, 2 min at 54°C, 3 
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Table 5 Species and GenBank accession numbers for the rbcL gene sequences analysed in this study 



Family 



Genus 



Species 



GenBank Acc. No. 



Taxaceae 



Taxus 



Pseud otoxus 
Austrotoxus 
Torreyo 



Cephalotaxaceae 



Amentotoxus 



Cepholotaxus 



Podocarpaceae 



Podocorpus 



Toxus moirei (Lemee & H.Lev.) S. Y. Hu ex T. S. Liu AY450856* 

Taxus yunnonensis W. C. Cheng & L K. Fu AY450857* 

Toxus chinensis (Pilg.) Rehder AY450855* 

Taxus baccata L. AF456388 

Taxus cuspidata Siebold & Zucc. EF660720 

Taxus sumatrana (Miq.) de Laub. EF660706 

Taxus wallichiana Zucc. EF660717 

Taxus canadensis Marshall EF660724 

Taxus fuana N. Li & R. R. Mill EF660725 

Taxus globosa Schltdl. EF66071 0 

Taxus brevi folia Nutt. AF249666 

Taxus x hunnewelliana Rehder EF660723 

Taxus x media Rehder EF660722 

Pseudotaxus chienii (W. C. Cheng) W. C. Cheng AY450858* 

Austrotaxus spicata (R. Br. ) Compton AF456385 

Torreya californica Torr. AY664858 

Torreya taxifolia Am. AF456389 

Torreya nucifera (L.) Siebold & Zucc. AB027317 

Torreya grandis Fortune EF660733 

Torreya fargesii Franch. EF660735 

Torreya fargesii var. yunnanensis (W. C. Cheng & L. K. Fu) N. Kang AY450861* 

Torreya jackii Chun EF660734 

Amentotaxus argotaenia (Hance) Pilg. AY450859* 

Amentotaxus yunnanensis H. L. Li AY450860* 

Amentotaxus formosana Hi. Li EF660708 

Cephalotaxus harringtonia (Knight ex J.Forbes) K. Koch EF660730 

Cephalotaxus sinensis (Rehder & E.H.Wilson) H. L. Li AY450864* 

Cephalotaxus koreana Nakai EF660726 

Cephalotaxus wilsoniana Hayata AB027312 

Cephalotaxus oliveri Mast. AY450865* 

Cephalotaxus hainanensis H. L. Li AY450862* 

Cephalotaxus griffithii Hook. f. EF660704 

Cephalotaxus mannii Hook. f. EF660707 

Cephalotaxus fortune! Hook. AY450863* 

Cephalotaxus latifolia Cheng & L. K. Fu EF660712 

Cephalotaxus fortune! var. alpina H. L. Li EF660714 

Cephalotaxus lanceolata K. M. Feng EF660709 

Cephalotaxus drupacea Siebold & Zucc. EF660716 

Podocarpus lawrencei Hook. f. AF249600 

Podocarpus nivalis Hook. AF249619 

Podocarpus gnidioides Carriere AF249607 

Podocarpus totara G. Benn. ex D. Don AF307931 

Podocarpus acutifolius Kirk AF249599 

Podocarpus saligna D. Don AF249628 

Podocarpus smithii de Laub. AF249629 

Podocarpus hallii Kirk AF249609 

Podocarpus nubigenus Lindl. AF249621 

Podocarpus latifolius (Thunb.) R.Br, ex Mirb. AF249612 

Podocarpus reichei J. Buchholz et N. E. Gray AF479879 

Podocarpus cunninghamii Colenso AF249603 

Podocarpus chinensis Wall, ex J. Forbes AF249602 
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Table 5 Species and GenBank accession numbers for the rbcL gene sequences analysed in this study (Continued) 



Nageia 

Docrycarpus 

Dacrydium 



Outgroup 



Podocorpus fasciculus de Laub. 


AF249622 


Podocorpus off. pilgeri MH6655 


AF249624 


Podocorpus off insuloris MH6612 


AF249611 


Podocorpus mocrophyllus (Jhunb.) D. Don 


AY450866* 


Podocorpus off degeneri MtLoftyBG 


AF249627 


Podocorpus elotus R. Br. ex Endl. 


AF249606 


Podocorpus polystochyus R. Br. ex Endl. 


AF249626 


Podocorpus lucienii de Laub. 


AF249615 


Podocorpus polyspermus de Laub. 


AF249625 


Podocorpus neriifolius D. Don 


AY450867* 


Podocorpus longifoliolotus Pilger 


AF249614 


Podocorpus novoe-coledonioe Viell. 


AF249620 


Podocorpus spinulosus (Sm.) R.Br, ex Mirb. 


AF249630 


Nogeio nogi (Thunb.) Kuntze 


AY450868* 


Nogeio nonkoensis (Hayata) R.R. Mill 


AF249649 


Docrycarpus imbricotus (Thunb.) de Laub. 


AY450869* 


Docrycarpus docrydioides (Rich.) de Laub. 


AF249597 


Dacrydium pierrei de Laub. 


AY450870* 


Pinus koraiensis Siebold & Zucc. 


NC_004677 


Pinus thunbergii Pari. 


NC_001631 


Welwitschio mirabilis Hook. 


NC_010654 


Gnetum parvifolium (Warb.) W. C. Cheng 


NC_011942 


Cycos taitungensis C. F. Shen et al. 


NC_009618 


Ephedra equisetina Bunge 


NC_011954 


Ginkgo biloba L. 


DQ069500 



The sign * denotes sequences experimentally determined in the present study. 



min at 72°C, and 10 min at 72 °C, Negative controls 
where all reagents but DNA were added to the reaction 
mix were included in order to verify the absence of con- 
tamination. The forward and reverse primers were 
S'-ATGTCACCACAAACAGAGACT-S' and 5'-CCTTCA 
TTACGAGCTTGCACAC-3', respectively. PCR products 
and sizes were verified in agarose gels. The purified PCR 
products were sequenced directly in both forward and 
reverse directions. Three repeats of each fragment were 
determined to control for Taq polymerase errors. 

Phylogeny with timescale 

DNA sequences of the coding regions obtained experi- 
mentally plus those retrieved from the public databases 
(Table 5) were multiple aligned using the MUSCLE soft- 
ware [70]. To improve the accuracy of phylogenetic 
inference, we excluded: i) multiple data from identical 
species, ii) sequences containing frame-shift mutations, 
and iii) ambiguously aligned regions. The appropriate 
DNA substitution model was identified by Modeltest 
v.3.7 package [27] via comparing 56 available models 
using the Akaike Information Criterion (AIC). The data 
set was also uploaded to the Datamonkey 2010 website 
for the estimation of the best fit model [28]. The tree 



topology inferred using the appropriate model for rbcL 
sequences was utilized as pre-defined tree in the 
adaptive and non-independent evolution analysis. 

The following nodes within the phylogeny were cho- 
sen to constrain for a rate consistent with the known 
relationships: i) based on the Cratonia cotyledon fossils 
[71], the split of Gnetum and Welwitschia was con- 
strained to 110 Mya (Node 72); ii) the good estimation 
of the split of Taxus and Cephalotaxaceae was con- 
strained to 169 Mya (Node 66) [56]; iii) Node 75 was 
constrained to 225 Mya based on the earliest Pinaceae- 
type seed [72]; iv) Node 73 was constrained to 125 Mya 
based on the earliest Ephedra-type seed [73]. Following 
the suggestion of the authors of BEAST, we ran the 
empty alignment before the real data to avoid misspeci- 
fication of dating and taxon sampling. 

We applied BEAST to infer topology, branch lengths, 
and dates for the rbcL gene. As the relationships of Tax- 
aceae and Cephalotaxaceae families have been histori- 
cally under debates, the prior information on these two 
families was not given before the estimation. A normal 
distribution was applied over the estimating of the abso- 
lute ages via the MCMC process. BEAST runs of 4 x 
10 7 generations, saving data every 1,000 generations, 
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produced 40,000 estimates of dates under a Yule specia- 
tion prior and an uncorrelated relaxed clock [74] for the 
rbcL gene dataset. Convergence statistics was analyzed 
in Tracer, resulting in 36,000 post-burn-in trees. We 
used TreeAnnotator v. 1.5.3 [74] to produce maximum 
clade credibility trees from the post-burn-in trees and to 
determine the 95% probability density of ages for all 
nodes in the tree. 

To illustrate the relationship between the ancestral 
adaptation and the concentration of C0 2 in the atmo- 
sphere, the RC0 2 value along with the time scale was 
mapped under the phylogeny (Figure 1) [55]. 

Detection of positive and negative selection sites 

Identification of adaptive evolution (positive selection) is 
fundamental to our understanding of the process of 
adaptation and diversifying selection. The general con- 
sensus is that nonsynonymous nucleotide substitutions 
{d N )> whose alternatives leading to a change in the 
codon and its corresponding amino acid, can be scaled 
by the number of synonymous replacements (ds), which 
are nucleotide changes that only change the codon but 
not the amino acid and are consequently neutrally fixed 
and proportional to the divergence time between the 
sequences. Because the d N changed the amino acid 
sequence and protein function depending on its struc- 
ture, this parameter is often under the filter of Natural 
selection. It follows consequently that the nonsynon- 
ymous-to-synonymous rates ratio (co = d N /d s ) can be 
considered as a stringent measure of selection [75,76]. 
Positive adaptive evolution occurs episodically during 
the evolution of proteins and this selection signal is 
generally swamped in a background signal of negative 
selection, which makes it difficult to robustly identify 
adaptive evolution. In order to identify signs of adaptive 
evolution we used two maximum-likelihood based mod- 
els implemented in the CODEML program from PAML 
package version 4.1 [32], the random-site model and the 
modified branch-site model, for detecting the positive 
and negative selection sites within rbcL sequences 
among lineages. The random-site model allows the co to 
vary among amino acid sites within the multiple 
sequence alignment and this parameter is estimated by 
maximum-likelihood following Goldman and Yang 
(1994) [77]. Conversely, the branch-site model (BSM) 
accounts for the variation in selective constraints among 
sites and lineages in the phylogeny synchronously. 
Within the BSM, we applied the modified model A test 
to reduce the false positive results as advised in the 
manual file of PAML version 4.1 [78]. In addition to the 
previous models, we also compared codon-based models 
that estimate one or several co values for the different 
categories of codons. We conducted the likelihood ratio 



test (LRT) [79] to compare the different nested models 
(MO vs. M3, Mia vs. M2a, M7 vs. M8, M8a vs. M8, and 
alternative test (co 2 estimated) vs. null test (co 2 - 1, 
fixed)) [80]. In the branch-site models, the branches 
were selected to testify whether the species have a big- 
ger opportunity to undergo an episodic adaptive evolu- 
tion along with the acutely changing atmospheric C0 2 
concentration (Figure 1, grey regions). The following 
seven lineages were selected: (i) Taxaceae-Cephalotaxa- 
ceae, (ii) Taxus; (iii) Torreya; (iv) Podocarpaceae; (v) 
Podocarpus I; (vi) Podocarpus II; and (vii) Cephalotaxa- 
ceae. To address the problem of multiple comparisons, 
the Bonferroni correction was employed during the con- 
tinuous checking with the A-Al models [29]. 

The multiple sequence alignment (MSA) was also sub- 
mitted to the Selecton website [81] for the comparison 
between empirical models and mechanistic empirical 
combination (MEC) models. Moreover, the ancestral 
states of the rbcL sequences were reconstructed via the 
DAMBE package [82] and the ASR module on the Data- 
monkey website [28,83]. The offspring sequences were 
compared with the ancestral sequences on each node. 
And the sequence was submitted to European Bioinfor- 
matics Institute (http://www.isb-sib.ch/ Swiss-model) for 
predicting the three-dimensional structure of the RbcL 
subunit. 

Identification of intra-protein coevolutionary pattern 

To understand the broad implications of the amino acid 
replacements in the RbcL subunit we conducted an analy- 
sis of the evolutionary dependencies among sites to iden- 
tify functional/structural dependencies among residues. If 
two amino acid sites were under adaptive evolution and 
these sites were co-evolving, this may indicate their func- 
tional/structural dependency. Intra-protein co-evolution in 
rbcL was tested via the program CAPS [20]. This algo- 
rithm implemented in this program takes the phylogenetic 
dependencies into account and correct them [84] and has 
been reported to outperform other approaches [85]. 

Broadly, CAPS compares the correlated variance of 
the evolutionary rates at two sites corrected by the time 
since the divergence of the two sequences. The signifi- 
cance of the results was evaluated by randomization of 
pairs in the alignment, calculation of their correlation 
values, and comparison of the real values with the distri- 
bution of 10,000 randomly sampled values. The step- 
down permutation procedure was employed to correct 
for multiple tests and non-independence of data [86]. 
An alpha value of 0.001 was applied to minimize type I 
error. The correlated variability between amino acid 
sites was weighted by the level of substitutions per 
synonymous site in order to normalize parameters by 
the time of sequence divergence [87]. 
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Reviewers' comments 

Reviewer's report 1 

Prof. Christian Blouin (nominated by W Ford Doolittle), 
Dalhousie University Halifax, Nova Scotia, Canada 
This reviewer provided no comments for publication. 

Reviewer's report 2 

Dr Endre Barta (nominated by Sandor Pongor), Interna- 
tional Centre for Genetic Engineering and Biotechnology, 
Trieste, Italy 

Gymnosperm plants played an important role in Earth's 
flora, especially in the prehistoric ages. In this manuscript, 
the authors use elegant molecular evolutionary analyses to 
answer some open questions about the evolutionary pro- 
cesses tailoring the chloroplast-coded rbcL gene during 
the adaptation to the changing C0 2 concentration in the 
atmosphere. The authors use rbcL coding sequences from 
three gymnosperm families. The rbcL is a very specific and 
very constrained protein, coded in the chloroplast genome 
and also being in complex with the rbcS, which is coded in 
the nucleus. The three gymnosperm families are good sub- 
jects for this analysis because i) they represent almost 14% 
of the gymnosperm diversity and can be found globally on 
the Earth, ii) we have fossil records allowing to constrain 
the phylogenetic tree timescale. 

The authors present a robust evolutionary analysis 
based on the multiple alignment of rbcL coding 
sequences from different gymnosperm species. They 
found that a complex adaptation process occurred during 
the evolution of these taxa. They also discussed the struc- 
tural and functional consequences of these processes and 
concluded that certain compensatory replacement muta- 
tions could play important role in the fixation of the 
functionally novel mutations. The analysis is very sound 
and in most cases based on different methods and mod- 
els. The basic idea and the results can be interesting for 
the broader community. 

I have only some theoretical questions and three 
minor technical comments. 

Questions: 

Are there any known examples for the same compen- 
satory mutation pairs from other plant species (i.e. evi- 
dence for convergent evolution)? 

Authors' response: Recent report of the modification 
on both large and small subunits of Rubisco enzyme in 
Flaveria (Asteraceae) might be an example for the simi- 
lar patterns [88]. The two subunits are under selection 
during the evolution from C 3 to C 4 photosynthesis. This 
pattern may be an evidence for convergent evolution 
under ecological pressures. However, the compensatory 
mutation pair is not coincided in the study. 

How could the geographical isolation of different 
populations influence the results of this study? Are 



there any samples (rbcL sequences) from geographically 
well separated plants from the same species? Do you 
expect any polymorphisms at any replacement site in 
the small populations of these gymnosperm species? 

Authors' response: If we take the geographical C0 2 
variation into account, the geographical isolation of dif- 
ferent populations will significantly influence the results 
of the present study. As has been reported previously, the 
rbcL gene has undergone adaptive evolution during the 
radiation in the Hawaiian endemic genus Schiedea, 
which demonstrates that rbcL gene evolved under strong 
positive selection impacted by the geographical isolation 
[13]. Nonetheless, the present research is mainly focused 
on the relations above species level, so the samples frbcL 
sequences) from the same species are excluded in the 
analyses. The polymorphisms at those replacement sites 
probably exist in the small populations of these gymnos- 
perm species. 

The inferred tree topology and the taxonomic classifi- 
cation of the genera in Taxaceae seem to be different. 
How can you explain this? 

Authors' response: Since its lower evolutionary rate, 
the rbcL gene has certain limitations on the deeper phy- 
logenetic levels (e.g. at the genus level) [89]. On the other 
respect, the molecular adaptation in rbcL gene per se 
also has impact on the inferring of the phylogenetic trees 
[17]. The above two factors may be the explanation for 
the disagreement between the inferred tree topology and 
the taxonomic classification. 

Is it possible to deduce from this analysis the ancestral 
sequence of the rbcL gene characteristic for the different 
nodes? 

Authors' response: Positive answer. However, more 
experimental data is required for the inference of the 
rbcL gene characteristic even though the inferring of the 
ancestral sequence from the current ones is of statistical 
efficiency [90,91]. We believe that much more work have 
been left for the further research after our computational 
estimation. 

Comments: 

Reading the abstract at a first glance, it is not clear 
what is the relation between Rubisco and rbcL. Clarify- 
ing this would help the readers who are not familiar in 
plant biology. 

Authors' response: We agree with this remark and 
changed the sentence accordingly. One sentence has been 
added into the abstract especially for the introduction of 
the relation between Rubisco and rbcL gene. 

It is very difficult to review the tables in general, and 
especially the Table 3. Using grids, or re-structuring 
them would help a lot. 

Authors' response: We agree with this remark and re- 
structured Table 3(new Table 2) accordingly. 
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Referencing Figure 4 is before the first reference to 
Figure 3, and no reference for Table 1 in the text. 

Authors' response: We appreciate the constructive 
comment. The order of referencing figures has been re- 
checked. Since Table lshows the original plant materials 
of this research, which cannot be omitted, we changed its 
appearance from the first table (former Table 1) into the 
last one (new Table 5). 

Reviewer's report 3 

Dr Nicolas Galtier, CNRS-Universite Montpellier II 
Laboratoire "Genome, Populations, Interactions, Adapta- 
tion", Montpellier, France 

This manuscript analyses the molecular evolution of 
the essential rbcL gene in three gymnosperm families. 
The functional relevance of amino-acid sites detected as 
positively selected or co-evolving is discussed. Here are 
my major comments: 

The text is quite affirmative regarding divergence 
dates, and their relationship with atmospheric C0 2 
abundance. I am not sure that molecular dating is that 
trustable, even with the use of clock-relaxed models, as 
illustrated by many controversies in the recent literature 
(e.g. Graur & Martin 2004 Trends Genet, Douzery et al. 
2004 PNAS, Peterson et al. 2004 PNAS, Roger & Hug 

2006 Philos Trans, Emerson 2007 Syst Biol), owing to 
paleontological uncertainty and tricky rate/time decou- 
pling [92-96]. Some prudence would appear required 
here, and the uncertainty of date estimates could be dis- 
cussed. This is especially true knowing that the uncorre- 
cted model in BEAST was used here, an approach 
which was criticized in the recent past (Lepage et al. 

2007 Mol Biol Evol) [97]. 

Authors' response: Due to the paleontological uncer- 
tainty and trick rate/time decoupling it is still an unre- 
solved scientific theme whether the modern molecular 
clock has the ability to reconcile the fossil evidence and 
the time estimation [98]. However, as indicated in many 
other literature (e.g. Welch & Bromham 200S Trends 
Ecol Evol, Ho 2007 J Avian Biol, Ho 2009 Biol Lett), lots 
of recent methodological advances have been carried out 
focusing on the topic [99-101]. Specifically, although cor- 
related model (also known as corr elated-rates model, or 
CR model) outperforms the uncorrelated model (also 
known as independent-rates model, or IR model) in the 
instance provided by Lepage et al (2007) [97], several 
authors have noticed that the uncorrelated model is bet- 
ter than the correlated model during estimating the 
dynamics of evolutionary rates in other instances 
[22,102-104]. Moreover, Zhong et al. (2009) argued quite 
recently that the uncorrelated model is superior to the 
correlated model in guesstimating the episodic rate accel- 
eration in ancestral plant lineages [105]. Collectively, all 



the above conclusions indicate that the modern molecu- 
lar clock relied on uncorrelated model is applicable for 
our present study on the gymnosperm plants. 

The reason for species sampling in this study is not 
obvious. Just three families were (thoroughly enough) 
sampled, when the focus of the study is on rbcL adapta- 
tion during the > 200 Mya of gymnosperm evolution. A 
more balanced sampling across gymnosperm families 
might help corroborate some of the results reported 
here. 

Authors' response: The three families (Podocarpaceae, 
Taxaceae and Cephalotaxaceae) represent over 14% of 
the gymnosperm diversity and can be found globally on 
the Earth [23,24]. Moreover, reliable fossil records can be 
obtained to calibrate molecular clock for dating the time 
of the phylogenetic trees [56]. So the thoroughly enough 
sampled species in the three families could partially 
represent adaptive and co evolutionary patterns of rbcL 
gene in the related gymnosperms under geological time- 
line. And we also believe that further research including 
other families will shed new lights on the big thesis. 

Along the same lines, it would be good to know 
whether the sites identified as positively selected or coe- 
volving in gymnosperms behave similarly in angios- 
perms (and perhaps other groups of plants), for which a 
huge database of rbcL sequences is available. 

Authors' response: As far as we can see, the atmo- 
spheric C0 2 concentration is one important factor (also 
known as ecological pressure) related to the adaptation 
of Rubisco enzyme [14,60]. Nevertheless, other factors 
also have impact on the evolution of this enzyme. For 
instance, the C 3 /C 4 photosynthesis in angiosperms have 
effects on the modification o/rbcL gene [5]and rbcS gene 
[88]. The comparison analyses merely along the identical 
timeline, ignoring other ecological pressures, may mislead 
the conclusions. Since the above reasons, we only focus 
our sampling on the present families. 

The discussion emphasizes potential adaptive pro- 
cesses, in possible connection to C0 2 availability across 
time. I note that if rbcL evolution was related to atmo- 
spheric C0 2 variations, we would expect adaptive evolu- 
tion to occur simultaneously in contemporary branches 
of the tree, in line with sudden RC0 2 changes. Such a 
pattern is not clearly detected, so I wonder what in the 
data makes the author link rbcL evolution to atmo- 
spheric C0 2 concentration, especially knowing that the 
adaptative signal is not prominent. 

Authors' response: The ability to undertake adaptive 
evolution depends on several factors. Gymnosperm plants 
played an important role in Earth's flora, especially in 
the prehistoric ages. This implies that the species of the 
three gymnosperm families have a higher feasibility to 
undergo adaptation in the prehistoric branches. Along 
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with the rising of angio sperms, members from the con- 
temporary branches of gymnosperm plants are character- 
ized by their small population sizes, which make them 
feasible for genetic drift The current analysis results and 
the biological background drew us a big imagination of 
the ancestral rbcL gene adaptation associating with the 
variations of the atmospheric C0 2 concentration. 

List of abbreviations used 

Rubisco: ribulose-1, 5-biphosphate carboxylase/oxygenase; RuBP: D-ribulose- 
1, 5-bisphosphate; cp: chloroplast; RbcL: large subunit of Rubisco enzyme; 
RbcS: small subunit of Rubisco enzyme; LSU: large subunit; SSU: small 
subunit; MSA: multiple sequence alignment; Mya: million years ago; MRCA: 
the most recent common ancestor; K/T extinction: Cretaceous-Tertiary 
extinction; UCLD: uncorrelated lognormal model. 
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